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Abstract 

This article is motivated by the following satisfiability question: pick uniformly at random an and/or Boolean 
expression of length n, built on a set of k n Boolean variables. What is the probability that this expression is 
satisfiable? asymptotically when n tends to infinity? 

The model of random Boolean expressions developed in the present paper is the model of Boolean Catalan 
trees, already extensively studied in the literature for a constant sequence (k n ) n ^ i. The fundamental break¬ 
through of this paper is to generalise the previous results for any (reasonable) sequence of integers (fc n ) n ^i, 
which enables us, in particular, to solve the above satisfiability question. 

We also analyse the effect of introducing a natural equivalence relation on the set of Boolean expressions. This 
new quotient model happens to exhibit a very interesting threshold (or saturation) phenomena at k n — n /hm. 

Keywords: Boolean formulas/functions; Catalan trees; Equivalence relation; Probability distribution; Satisfi¬ 
ability; Analytic combinatorics. 


1 Introduction 

For several decades, satisfiability problems have been extensively studied by computer scientists and probabilists, 
as well as statistical physicists. In this paper, we focus on the probabilistic version of satisfiability problems: what 
is the probability that a random Boolean expression is satisfiable? The answer to this question obviously depends 
on the distribution considered on the set of Boolean expressions. 

One of the most studied satisfiability problems is the 3-SAT problem. It consists in choosing uniformly at 
random an expression among conjunctions of n clauses, each clause being a disjunction of three literals - where 
literals are chosen among a set of k n variables and their negations. What is the probability that such a random 
Boolean expression is satisfiable? when n tends to infinity? 

This question is already partially answered - see for example pQ: the following phase transition is proven. If the 
ratio k n/n is small enough, then the random expression is satisfiable with probability tending to 1 when n tends to 
infinity, whereas if the ratio k ^/n is large enough, then, this probability tends to 0. Refining this statement is the 
challenging aim of a large literature. 

There are many other satisfiability problems. The K— SAT problem is for example the object of a recent break¬ 
through by Coja-Oghlan and Panagiotou [5] and Coja-Oghlan [3], who obtained the existence of a sharp threshold 
when K tends to infinity. The 2-XORSAT problem is studied by Daude and Ravelomanana [6j, using Analytic 
Combinatorics to exhibit and describe precisely a phase transition phenomenon. 

The aim of the present paper is to define and study a new satisfiability model (i.e. a new distribution on the 
set of Boolean expressions) inspired by the literature on quantitative logics. 

Quantitative logics, which origin might go back to the work of Woods [ 2U] , aims at answering this question: 
Which Boolean function does a random Boolean expression represent? Once again, the answer to this question 
deeply depends on the model of randomness chosen for Boolean expressions. 
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The Catalan tree model, first studied by Lefmann and Savicky HE is defined as follows: A Boolean tree is a 
binary plane rooted tree (i.e. a Catalan tree) whose internal nodes are labelled by the connectives and or or and 
whose leaves are labelled by k variables and their negations. Pick up uniformly at random a tree among Boolean 
trees of size n , and denote by Pthe distribution it induces on the set of Boolean functions. Lefmann and Savicky 
first proved the existence of a limiting probability distribution P^ on Boolean functions when the size n of the 
random Boolean expression tends to infinity. 

Since the seminal paper by Chauvin et al. [2], the Analytic Combinatorics’ community aims at understanding 
better the Catalan tree distribution Pj, (and similarly defined distributions) on the set of Boolean functions. In 
particular, Kozik m proves, in the Catalan tree model, an asymptotic (when k tends to infinity) relation between 
the probability of a given function and its complexity (i.e. the complexity of a Boolean function being the size 
of the smallest tree representing it). His powerful approach, the pattern theory , easily classifies and counts large 
expressions according to specific structural constraints. It will be generalised in the present paper. 

Remark that in the Catalan tree model defined above, the size n of the Boolean expressions tend to infinity 
while the number k of literals labelling them is fixed. For technical reasons, k is then sent to infinity in order to 
obtain an asymptotic estimate of the probability of a given Boolean function. It means that the trees we consider 
have a lot of repetitions in their leaves: it is legitimate to ask if this bias the distribution induced on the set of 
Boolean functions. Genitrini and Kozik Hi have proposed another model where random Boolean expressions 
are built on an infinite set of variables. This approach avoids the bias induced by letting n tend to infinity while k 
stays fixed. 

Our paper extends the Catalan model in order both (1) to let n and k tend to infinity together and (2) to fit in 
the satisfiability context. 

Following the extended abstract HE we also look at the influence of a natural notion of equivalence on the set 
of Boolean expressions and functions. Roughly speaking, we say that two expressions or functions are equivalent 
if the second one can be obtain from the first one by renumbering the variables. As an example, the expressions 
(aq and aq) and ( X 12 and aq) are equivalent. 

We will describe and study in parallel these two models (with an without equivalence classes) where the number 
of variables and the size of expressions jointly tend to infinity. Since the proofs will be very similar in both models, 
we will try general notations that fit both models. The model without equivalence classes will permit, as a corollary 
to answer the satisfiability problem in the context of Catalan Boolean expressions. It will be very interesting to 
see that, although the proofs are completely similar for both models, the probability distributions induced on the 
set of Boolean functions behave differently: the introduction of equivalence classes gives birth to an interesting and 
quite mysterious threshold phenomenon. 

The paper is organised as follows. In Section [2] we define our two new models: the generalised model where the 
number of variables depends on the size of the considered trees and the quotient model where we introduce a natural 
equivalence relation on Boolean trees and functions. Section [3] is devoted to stating and discussing our three main 
results: the satisfiability question for random Catalan expressions; the link between the probability of a Boolean 
function (resp. a class of Boolean functions) and its complexity , both in the generalised and the quotient models. 
Section [4] and Section [5] contain the technical core of the paper: Section [4] is an analytic part focusing mainly on 
the difficulties arising from the introduction of the equivalence relation, while Section [5] concerns both models and 
discusses Kozik’s pattern theory. Finally Section [6] contains the proofs of our main results. 


2 Description of the two models 

2.1 Contextual definitions 

A Boolean function is a mapping from {0,1} N into {0,1}. The two constant functions (aq)i»i 1 and ( x i)i^i 1— > 0 
are respectively called true and false. 

An and/or tree is a binary plane tree whose leaves are labelled by literals, i.e. by elements of {xi,Xi}i € n, and 
whose internal nodes are labelled by the connective and or the connective or, respectively denoted by a and v. We 
will say that Xi and Xi are two different literals but they are respectively the positive and the negative version of the 
same variable aq. Every and/or tree is equivalent to a Boolean expression and thus represents a Boolean function: 
for example, the tree in Fig. [l] is equivalent to the expression ([aq v (—■ X\ v aq)] v aq) v (aq ai/, where —■a; = 1 — x 
for all x e {0,1}, and represents the constant function true. 

The size of an and/or tree is its number of leaves: remark that, for all n ^ 1, there is infinitely many and/or 
trees of size n. Finally we define the tree-structure of an and/or tree to be the and/or tree where the labels of the 
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Figure 1: An and/or tree computing the constant function true. 


leaves (but not of the internal nodes) have been removed. 

Definition 1. The complexity of a non constant Boolean function f, denoted by L(f), is defined to be the size 
of its minimal trees, i.e. the size of the smallest trees computing f. The complexity of true and false is defined to 
be 0. 

Although a Boolean function is defined on an infinite set of variables, it may actually depend only on a finite 
subset of essential variables. 

Definition 2. Given a Boolean function /, we say that the variable x is essential for f if, and only if, f \ x <-0 ^ 
f\ x ^_ 1 (where f\ x ^ a is the restriction of f to the subspace where x = a). We denote by E(f) the number of essential 
variables of f. 

Remark that the complexity and the number of essential variables of a Boolean function are related by the 
following inequalities: E(f) < L(f) ^ 2 E (E+ 2 ( se e e.g. [7] p. 77-78] for the second inequality). Note that, 
asymptotically when E(f) tends to infinity a tight asymptotic upper-bound is 2E{1) /E(f), as proved by Lupanov |IB| 
for the upper bound and Lutz HU for the lower bound. 

In the whole paper, our models propose a way to make n and k tend to infinity together: 

Definition 3. Let (k n )„^ 1 be an increasing sequence of integers such that k n tends to infinity when n tends to 
infinity. 


2.2 The generalised Catalan tree model 

Let us recall the definition of the Catalan tree model defined and studied by Paris et al. m, Lefmann & Savicky m, 
Chauvin et al. |2j and Kozik m■ In those papers, the authors fix an integer k ^ 1 and consider the uniform 
distribution on and/or trees of size n whose leaf-labels are constrained to be in {aq, aq,..., Xk,Xk}- They study the 
induced distribution on the set of Boolean variables and prove that this distribution converges to a limit distribution 
pk when the size n of the trees tends to infinity. Given a Boolean function /, they then prove asymptotic theorems 
for pk(f) when k tends to infinity. In this approach, the order of the two limits (on n and then on k) is a priori 
important. 

We define first the generalised Catalan tree model, that is a natural extension of the previous model. 


The model (G) is defined as follows: 

(1) consider the uniform distribution on and/or trees of size n which leaf-labels belong to {aq,aq, ..., Xk n ,Xk n }, 


(2) denote by P„ the distribution it induces on the set of Boolean functions, and call this new distribution the 

generalised Catalan tree distribution. 

Remark that there are A n and/or trees of size n labelled with k n variables, with 

1 /2?7 — 2\ 

A n = 2" -1 (2 k n ) n ■ Cat„, where Cat„ = ( ), (1) 

n\n — 1) 


i.e. Cat n is the number of binary plane trees having n leaves. 

For all Boolean function /, we denote by A. n (f) the number of and/or trees of size n labelled with k n variables 
that compute /. Thus, by definition, 


Pn(/) 


AM) 
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2.3 The quotient Catalan tree model 

A second natural generalisation of the Catalan tree model is obtained by introducing equivalence classes of Boolean 
trees and functions. The idea is the following: the functions {xi)i^\ >—> X\ a X2 and (xfji^i >— * £38 a X12 can be seen 
as two realisations of the function conjunction. 

Informally, two and/or trees are equivalent if the leaves of the first one can be relabelled (and negated) without 
collision in order to obtain the second tree. We define formally this equivalence relation as follows. 

Definition 4. Let A and B be two and/or trees. Trees A and B are equivalent if 

(i) their tree-structures are identical; 

(ii) two leaves are labelled by the same variable in A if and only of they are labelled by the same variable in B; 

(Hi) two leaves are labelled by the same literal in A if and only of they are labelled by the same literal in B. 

This equivalence relation on Boolean trees induces straightforwardly an equivalence relation on Boolean functions. 
Note that all functions of an equivalence class have the same complexity and the same number of essential variables. 
In the following, we will denote by //) the equivalence class of the function /. We denote by L(f ) = L{f) (resp. 
E(f) = E(f)) the common complexity (resp. number of essential variables) of the elements of (/). 

Definition 5. Let (f) be a class of Boolean functions. The multiplicity of the class (/), is given by 

R(f) = L(f)-E(f). 


It corresponds to the number of repetitions of variables in a minimal tree of a function from (/). 

Recall that (fc„) n ^i is an increasing sequence of integers that tends to infinity when n tends to infinity. In the 
following, we only consider equivalence classes of trees having at least one element whose leaf-labels are in {xi,xi, 
..., XkniXkn}- It means that we restrict ourselves to trees of size n labelled by at most k n different variables. Note 
that if k n is n for all n ^ 1, this is not a restriction because a tree of size n cannot contain more that n different 
leaf-labels. 


The model (E) is defined as follows: 

(1) consider the uniform distribution on classes of equivalence of trees of size n (labelled with at most k n different 
variables), 

(2) the distribution it induces on the set of equivalence classes of Boolean function is denoted by P„ and called the 

quotient Catalan tree distribution. 

We denote by A n the number of equivalence classes of trees of size n (in which at most k n different variables 
appear as leaf-labels). Given a class of Boolean functions (/), we denote by A n (f ) the number of equivalence classes 
of trees of size n (labelled with at most k n different variables) that compute a function of //). We thus have 


Pn</> 


Mf) 


Proposition 1. The number of classes of trees of size n satisfies: 


A n = CaW.^1^2 2 - 1 -^ 


where Cat n is the number of (unlabelled) binary planar trees having n leaves (cf. Equation ([l])/, and where {”} is 
the Stirling number of the second fc'ndf] 

Proof. An equivalence class of and/or trees can be seen as 

• a binary plane tree (factor Cat„) 

• whose internal nodes are labelled by and and or connectives (factor 2 n_1 ), 

1 In Proposition 111 is the number of partitions of n objects in p non-empty subsets (see e.g. [TJ p. 735-737]). 
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• whose leaves are partitioned onto 1 < p ^ k n parts (factor {"}), 

• each of these parts being then partitioned onto two parts (one on them being possibly empty: factor 2 n-p ). 

□ 

Remark on notations: We have already used the notation A n to define the model (G). We will keep the same 
notation for these two distinct objects because they will have the same role in the proofs. But formally, we have 

A!L G) = Cat„ • 2 2 "- 1 ■ k n n and A® = Cat„ • J] j" 


3 Main results and discussion 


We have defined the two models we are interested in: the generalised and the quotient Catalan trees distributions. 

/ q \ / g \ 

Both distributions are called P n for simplicity’s sake, but we will use P„ ' and P„' when the precision is needed. 
The aim of this paper is to study the behaviour of both distributions when the size n of the considered trees tends 
to infinity. 

Let us remark that the distribution induced by (G) is based on an uniform distribution among trees of the same 
size. But the distribution induced by (E) lies on an uniform distribution among classes of trees of the same size. 
Obviously both induced distributions on Boolean functions are distinct. 


Theorem 1 (Model (G)). Let (k n ) n ^i be an increasing sequence of integers tending to infinity when n tends to 
infinity. For all Boolean functions f, there exists a positive constant a/ ' such that, asymptotically when n tends to 
infinity, 


P n(f) ~ af ■ 


(rf >+1 


This result has an interesting corollary concerning the Catalan-SAT problem: recall that a Boolean expression 
is said satisfiable if it does not represent the constant function false. 


Corollary 1 (Catalan-SAT). Let (k n ) n ^i be an increasing sequence of integers tending to infinity when n tends 
to infinity. Pick up uniformly at random an and/or tree of size n with leaf-labels in {x\,X\, ..., Xk n ,Xk n }- This 
random and/or tree is equivalent to a Boolean expression that is satisfiable with probability tending to 1 when n 
tends to infinity. 

Theorem 2 (Model (E)). Let (fc„)„j>i be an increasing sequence of integers tending to infinity when n tends to 
infinity. There exists a sequence (M„) n ^i such that M n ~ n _ >00 j ff^and such that, for all fixed equivalence classes 
of Boolean functions (/), there exists a positive constant a^jy satisfying: 

(i) if, for all sufficiently large n, k n ^ M n , then, asymptotically when n tends to infinity, 


Pn(f) 


(E) 

*</> 



«</>+! 


(ii) if, for all sufficiently large n, k n ^ M n , then, asymptotically when n tends to infinity, 


/1 \ 1 


First note, that we could give some corollary about satisfiability for the second model (E) too. However, in the 
classical context of SAT problems, there are no quotient formulas. So we omit this by-product. 

Let us discuss these results in view of the classical Catalan tree distribution studied by [2] and M- let us recall 
briefly its definition. Let k ^ 1 be an integer. We denote by T n ^ the number of trees of size n , with leaf-labels in 
{xi,^i,... ,Xk,Xk}- Given a Boolean function /, we denote by Tn,/c(/) the number of such trees computing /. The 
Catalan distribution is thus defined by, for all Boolean functions /, 


Pk{f) ■= 


lim 

n—►+00 


T n ,kU) 

T n ,k 


The existence of the above limit is proved in [13 or [2]. Kozik proved: 
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Theorem 3 (Kozik [H)'). 
constant Cf such that 


Let k be a fixed positive integer. For all Boolean functions f, there exists a positive 


Pk{f) >00 c/ ■ 


GP' 


As one can see Theorems [T] and [3] are very similar, and we will see that their proofs are also very similar after 
having observed a simple but fundamental trick: one has to consider separately the tree-structure of an and/or 
tree and its leaf-labelling. It was not clear before this work how to generalise Kozik’s proof in order to tackle the 
Catalan-SAT problem (cf. Corollary [l]). 


Introducing equivalence classes makes things different, and an interesting threshold effect appears (see Theo¬ 
rem |2|. We still have no intuition for this threshold. Obviously we will see in the proof where it comes from. 

In the classical Catalan tree model, each Boolean function is studied separately instead of being considered among 
its equivalence class. We can translate the result obtained by Kozik in terms of equivalence classes by summing 
over all Boolean functions belonging to a given equivalence class: note that there are (#(/)) functions in the 
equivalence class of /. Therefore, the result of Kozik is equivalent to: for all classes (/), there exists a constant 
such that, asymptotically when k tends to infinity, 


/ 1 \L(.f)-E(f )+1 / 1 \«</>+1 

The classical Catalan tree distribution can be seen as a degenerate case of our model where there exists a fixed 
integer k such that k„ = k for all n ^ 1. Recall that we assume in the present paper that k n tends to infinity when 
n tend to infinity: the case k n = k is thus not a particular case of our results, but only a degenerate one. 

Once again, the proof of Theorem [2] relies on similar ideas as Kozik’s proof of Theorem [3] To emphasise the 
similarities between the proof of our two main theorems (Theorems [l] and [2]), we will develop their proofs together 
in Section [G] 

4 Technical key point 

As we already mentioned, the key idea of this paper is to separate the tree-structure of an and/or tree and its 
leaf-labelling. Recall that 


P =i ^ 


4? = 2 n - 1 Cat n • (2 k n ) n and A?* = 2 n - 1 Cat„ • £ \ n \^ n ~ P ■ 
For all to, n ^ 1, let us denote by 


Lab n m :— < 


(2ro) n 


in model (G); 


m r \ 

2" ' Xj ) (2 _P m m °del (E). 

p= l ^ l 


In both models, Lab nj7n corresponds to the number of ways to label the n leaves with m variables, thus 

A n = 2 n_1 Cat„ • Lab.,,./, . 


Finally, let us introduce the key quantity 


rat„ : = 


Lab n _i ) /j n 

Lab ra> / Cri 


Note that in the model (G), the quantity i/rat„ = 2 k n corresponds to the number of the possible labellings of 
the (n + l) th leaf once the other leaves are already labelled. In the model (E), the leaf-labellings are not longer 
independent and this quantity l/rat„ is thus less explicit. A detailed analysis of this quantity is needed in the 
following. This section is devoted to its asymptotic analysis. 


Proposition 2. Let (k n ) n ^i be an increasing sequence of integer tending to infinity when n tends to infinity. 
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(G) For all integer p, 


1 


L3.bn — p ,k n 
Lab n>fc „ ~~ (2k„)P ' 


(E) There exists a sequence (M„)„j>i with M n ~ n _>.oo and such that, for all integer p, asymptotically when n 
tends to infinity, 


Lab n—p,k n 
Lab n ,k n 


r l+o(« 

(2fc„ + 

(i + 0 (i)) 


if k n < M n for large enough n; 
if k n ^ M n for large enough n. 


In particular, taking p = 1 gives 

Proposition 3. Let (fc n ) ra ;> i be an increasing sequence of integer tending to infinity when n tends to infinity. 


(G) rat„ = 

(E) There exists a sequence (M n )„^i with M n ~„_>oo 

f 1 + 0 ( 1 ) 

| 2 fc„ 

rat„ = { 


(l + o(l)) 


In n 
2 n 


n 

In n 


and such that, asymptotically when n tends to infinity, 
if k n M n for large enough n; 
if h n ^ M n for large enough n. 


Remark that, with this definition of rat n , Theorems [l] and [2] can be rephrased as: for all Boolean functions /, 
there exists constants 

P£°(/)~A/-rat^ +1 , 

and 

PYYf) ~ A</> • rat^> +1 . 

The proof of Proposition [3] (G) is obvious and the rest of this section is devoted to the more technical proof of 
Proposition [3] (E). 

The following proposition, which can be seen as some particular case of Bonferroni inequalities allows to exhibit 
bounds on Lab„ i fc n . 

Proposition 4 (cf. for example 03])- P'or all n ^ 1, for all p e {1,..., n}, 

p n (p_l)n ^ |Vj ^ p n 
pi (p - 1)! ^ \p) pi ' 

In view of these inequalities and of the expression of Lab„^ n , both the following sequences naturally appear: 
Lemma 1. Let n be a positive integer. 

(i) The following sequence is unimodal: 



pe{l,...,n} 


(+ 2 +) 

V V- ) pe{l,...,n} 


i.e. there exists an integer M n such that is strictly increasing on {1,..., M n } and strictly decreasing 

on {M n + 1,..., n}. 

(ii) Moreover, the sequence (M n ) n is increasing and asymptotically satisfies: 


M n 


n 

In n 
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Proof, (i) Let us prove that the sequence is log-concave, i.e. that the sequence (-rjf ) 

V / l^p^n \ a p J 

decreasing. Let p be an integer in {1,..., n — 1}. By Definition of dp : 


is 




(n) 
a p+1 

J") 


P +1 


2(p+l)’ 


and consequently, for all n ^ 0, 

> 1 ^ nln (^) ~ ln (2p+2) >0. 

The function </>„ : p i—> nln ~ ln(2p + 2) is strictly decreasing. Note that both <f> n (l) and 4>n(n — 1) are 

tending to infinity when n tends to infinity. Then, for all n large enough, there exists a unique M n such that 
is strictly increasing on {1,..., M rl } and strictly decreasing on { M n + 1,... ,n}. Let us suppose n large 
enough for the rest of the proof. 


(ii) Let us denote by x n the single solution of equation: 


x +1 


2{x + 1) 


= 1, 


when it exists. 


( 2 ) 


First remark that the sequence (x n ) n ^i is increasing. We indeed know: <f n {x n ) = 0 and <j> n +i(xn+i) = 0, which 
implies that <f> n (x n+ i) = — In fl + x y 'j < 0. Therefore, since for each n, the function <p n is decreasing, we have 

that x n+ i ^ x n , for all large enough n. Therefore, the sequence (M n ) n ^i is asymptotically increasing. 

Since, asymptotically when n tends to infinity, 


In ? 


+ i 


In n 


2(1^ + !) 


In n 

~Y~ 


we have that n/lnn < x n and therefore, x n 

n In 


tends to infinity. Thus, Equation ([2]) evaluated in x n is equivalent to 

(1 + — J = In 2 + ln(a;„ + 1), (3) 

\ x n j 


which implies x n In x n ~ n, when n tends to infinity. We easily deduce from this asymptotic relation that In x n ~ In n 
and that x n ~ when n tends to infinity. Since M n = [ 2 +J, we conclude that M n ~ "/inn, when n tends to 
infinity. D 


We are now ready to understand the asymptotic behaviour of Labn^/2": roughly speaking, asymptotically, the 
sum Lab n ^ n /2 n does essentially only depend on the terms around M n . 

Lemma 2. Let (u n ) n ^i be an increasing sequence such that u n < n for all integer n ^ 1 and u n tends to infinity 
when n tends to infinity. 

(i) If, for all large enough n, u n < M n , then, for all sequences (S n ) n ^i such that S n = o{u n ) and = o(S n ), 

we have, asymptotically when n tends to infinity, 


Lab n,u n 
2 " 


(l + o(l)) 



P -2~r 

p\ 


(4) 


(ii) If, for large enough n, u n ^ M n , then, for all sequences (S n ) n ^l such that 6 n = o(u n ) and — o(6 n ), 

2 _ 

for all sequences (rj n ) n ^i such that r] n = o(M n ), lim„^ +00 jf- = +oo and yM„ ln(ii n — M n ) = o(rj n ), we 
have, asymptotically when n tends to +oo, 


Lab, 


= (l + o(l)) 


min {M n + r) n ,u n } „ 

V ^—r2~ p . 
A-i . pi 

p=M n —S n * 


(5) 


2 n 



















Proof of Lemma [1] (i). Via Proposition^ we can bound 


: for all n ^ 1, 


1 y P " < 

2 P ! 2P u - ! 




Lab^ 


< 


I 


p! 2 p 

P=1 


(6) 


Let us assume that u n ^ M n for all large enough n, and let us prove that the two bounds of Equations © are 
of the same asymptotic order when n tends to infinity. 

Denote, for all integer m ^ 1, S m = Yi™=i a p • Thus Equations ([6j) implies 

Su n Lab n,u n q 

\ ^ O n i . 

2 2 ra 

Let us split the sum S u into two parts: the last S n summands, and the rest. 


&'u n ^u n — S n — 1 + 


7 («) 

Jjp 


p=u n —8 n 


By assumption, S n = o(u n ) and we therefore can choose n large enough such that u n > S n . Let us prove that 
S Un -5 „-1 is negligible in front of a Url , and thus in front of s a p l ’ > ■ Recall that (a^j is increasing on 

{1,..., M n }, which implies 

Su n —S n — 1 ^ U n ’ U u —$ • 


For all large enough n, via Stirling formula, we deduce: 

U n S n \ U n ? 


U Un -8 n _ 2 8 n 


= exp 


u r 
Sn hi 


(u n - 5 n )! 


2 u r 

e 


S n / c \ n—u n +8 n — h 

U n ~ 0 n 


(1 + 0 ( 1 )) 


2 u r 

e 


Since S n = o(u n ), we get In (l - ^ 


+ \ n - u n + <L - - In 1 - 


+ o 


+ o(l) 


Moreover, u n < M n thus, 


Uu n —8n 


= exp 


r , o , c , n6 n nSf (nS 2 

d n In 2 + d n in u n — --+ o 

Ufi, ^u~ 


Therefore, by using u n < M n , and Equation (j3j, we deduce fff- ^ In 2 + In M n , 

— 5 


+ exp 
+ exp 


2u- 




n<5^ / 

'2^ + ° 


From the assumption = o(S n ), we deduce lnu n = o f^p-Y thus we can conclude 

= o(l). 


^u n —5 n — 1 _A 

-s= u n -< exp 


1 n ^n 




And consequently, we get 5, 


n—>oo 


sr^u n (n) 

2jd=u„-(S„ °P ■ 


n 


Proof of Lemma [1[ Assume that rt n ^ M n for all large enough n. Let us split the sums of the lower and upper 
bounds of Equations ([b]) into three parts: the first from index 1 to M n — S n — 1, the second from index M n — 5 n to 
M n + rj n , and the third from index M n + rj n + 1 to u n . Remark that, if u n M n + r] n , then the third part is empty 
and the second one is truncated: 


M n +r)n 

$u n = ^M n —8 n — 1 d" ^ j 

p=M n — 8n 


4 n) + 


S 


7 («) 


p=M n +77^ + 1 
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By arguments similar to those developed in the proof of assertion (i), we can prove that 1 is negligible in 

front of , and thus in front of 2 pdY-s a p^- Therefore, if u n ^ M n + rj n , assertion (ii) is proved. Let us now 
assume that u n ^ M n + r] n + 1: to end the proof, we prove that 2p=j u N +r/ +1 a p”' ) negligible in front of d'd) , and 
thus in front of dp\ 

In view of Lemma we have 


^ dp l) (u n - M n - rj n ) ■ d n) 


M n +r) n 


p=M n +r) r j, + l 


Via Stirling formula, 

„(«) 


(n) 

X M„ 


= 2 “ 


= exp 


M n + T] r 


M n \ f 2(M n + r ln ) Y^ ( M n + n 


(• M n + r] n )\ \ e 


M n 


%~M n -i 


(1 + 0 ( 1 )) 


-r)n In 

Since In (l + < IfS we § et: 


2(M n + 7? n ) 


+ ( n - M„ - - ) In ( 1 + ) + °(1) 


a 


(n) 

M n +r) n 
(n) 




exp 


= exp 


—r] n In2 + r] n - r] n \n(M n + rj n ) + -^-(n - M n - ^) + o(l) 


-r) n In 2 - ?y n ln(M n + r/„) + + o(l) 


Our assumption states = o(l), thus 


a 


(n) 

Mn+ljn 

(n) 

M n 


< 


a 


exp 


= exp 


-iIn In 2 — rj n In M n - rj n In ( 1 + J + + o(l) 




Since M n = [x n \, we have 


therefore 


Equation ([3]) implies: 


n In ( 1 H-) = n 

x, 


+ o 


M n 2 Ml \Ml))' 


±-\ 


In 2 + ln(j; B + 1) = In 2 + In M rl + O ( —— ) . 

. ■‘■v-ln 


Ti Tl 

— =ln2 + ln M n + — + 0 
= ln2 + ln Mn +^- + 0 


n \ 

Ml) 

n \ 

Ml) 


+ o 


1 

Mn 


because dr = °i'W s )- Thus, we conclude 


a 


(n) 

Mn + r/n 

(n) 

M„ 


< 


exp 


a 


Mr 


+ o 


Tn 

Ml 


+ o 


( nr ln \ 

\M%) 


= exp 


Mr 


+ o 


Vd 

Mr 


because, from assumption: y/M n ln(u„ — M n ) = o(j] n ), we deuce y/M n = o[rj n ). Finally we get 


sr^u n 


ip=M„+?7„ + l a P 
An) 
l M n 


(n) 


+ (u n M n TJn'j - 


a 


(n) 

M n +7l n 

(n) 


+ exp 


HUn~M„)-^- + o[^ 


Mr. 


Tn 


Mr 


= o(l), 
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since, by assumption, y/ M n ln(zt„ — M n ) = o(p n ). Therefore, asymptotically when n tends to infinity, 


Mn+Vn 


Su 


s 


7 («) 


p=M n — Sn 

which concludes the proof. □ 

We are now ready for the proof of Proposition [2] let us decompose this proof in the two following Lemmas [3] 


an 


d® 


Lemma 3. Let (fc„) n ^i be a sequence of integerssuch that k n < M n for large enough n, then, for all integer p, 
asymptotically when n tends to infinity, 




Lab 


n,k n 


= (1 + 0 ( 1 )) 


(2 k n y 


Proof, (i) Let us first assume that k n < M n _ p . Let (S n ) n ^i an integer-valued sequence such that S n = o(k n ) 


and 


k n \/ In k 


infinity, 


— = o(S n ) when n tends to infinity. Lemma j2j applied to u n = k n gives, asymptotically when n tends to 


Lab n h 


= (i + o(i)) £ 


,(«) 


i=k n —S n 


Moreover, since k n ^ M n _ p , and since the sequence (S n ) n ^ 1 satisfies S n = o{k n ) and 
Lemma [2] to the sequence u n = k n gives us, asymptotically when n tends to infinity, 

k„ 


k n -y/ln k n 
y/n—p 


= o(5 n ), applying 


Lab n—p,k n 
2 n—p 


= (1 + 0 ( 1 )) 2 


,0-p) 


i=k n —S n 


Therefore, 

We have 

(*»-«' i; « { r r) 


yfe„ Jn-p) 

= (2^ + o(l)) 


Lab n—p,k n fn—v 1 \\ Zji=fc n — S n u i 

Lab n.k n 


ST'Kn 
2—ii = kn — 


i=k n — 5 n 


(«) 


k n k n 

;pA n ~P) _ V „(") 


s 


i—k n —&n 


which implies 


«£ > i^a. 

i = k n — Sn 

Lab n—p.kn 


= s «r = s 


i p a { J l ~ p) ^ 


i = k n —Sn 


i=k n —S n 


K 2 

P=k n — S n 


i(n-p) 


Lab 


n,k n 


(2 k n )t 


when n —* + 00 . 


(ii) Now assume that M n ~ p < k n M n . Let (S n ) n ^i be an integer-valued sequence such that 6 n = o(k n ) and 


knV In k 


==- = o(S n ). Let (r/ n )n^i be an integer-valued sequence such that r/ n = o(M n _ p ), and -y/ M n _ p ln(fc„ — M„_ p ) = 
o(r/ n ). Applying Lemma [ 2 ] (ii) to the sequence u n = k n , we obtain 

min{ M n —p +Pn ,k n } 


Lab, 


1 n—p,k n 


,0-p) 


2 n-p - (1 + °(1)) I] 

i=M n _ p —8 n 

Moreover, since 5 n = o(k n ) and — o(S n ), via Lemma^ (i ),applied to the sequence u n = k n , 

b 

Lab n,k n 


= (l + o(l)) ^ 


,(«) 


i=k n —8n 


Let us remark, as above, that 


(tn-s„r £ „(»-») * ii+i K £ 


i=k n —8n 


i=k n —8n 
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Moreover, since k n > M n - P , using similar arguments as those developed to prove Lemma E(V, 


A n ~p) 


min{k n iMn—p+rjn} 


z 


i=k n —S n 


(n—p) Lab n—pikn 

i 2 n ~P 


Z 

i=k n —S n 

Therefore, since S n = o(k n ), we get 

which concludes the proof. 

Lemma 4. Let (fc„) nS >i be a sequence of integers that tends to infinity when n tends to infinity. Let us assume that 
k n Js M n for large enough n, then, for all integer p, asymptotically when n tends to infinity, 


Lab„_ p , fc „ = (1 + o(1)) 1 


Lab 


(2 k n )P : 


D 


Labr,._ 


*n—p,k n 


Lab 


'n,k„ 


= (i + o(D) 


In n 


Proof. By assumption, k n ^ M n , which implies k n ^ M n - p . Let (S n ) n ^\ be a sequence of integers such that 
S n = o(M n _ p ) and Mn = 0(6 n+p ). Let (?? n ) n ^i be another sequence of integers such that p n = o(M n _ p ), 

and \jM n In (k n — M n ) = o(rj n+p ). We thus can apply Lemma [ 2 ] (ii) to u n = k n and conclude that, asymptotically 
when n tends to infinity, 


Lab n—p,k r 

2 n ~p 


min{ M n — P -\-r/mk n } 

-(l + o(l)) 2 

i=M n _ v —5 n 


Moreover, since the sequence (6 n )nis 1 verifies 5 n = o{M n _ p ) = o{M n ) and Mrl = o(S n+p ) = o(S n ), and 

the sequence (r) n )n^i verifies r] n = o(M n _ p ) = o(M„), and y/ M n ln(fo„ - M n ) = o(rj n+p ) = o{r] n ), we have, 


since 


min {M n +rjrnk n } 

. (1 + 0(1)) 2 «! n) - 

i=M n —S n 


Lab n ^k 


Let us note that 


min{M n -\-p n ,k n } m\n{M ri +r) ri ,k ri } 

(Mn-Sn)* 2 af- p U La ^<(M n + Vn y ^ af-rt. 

i. — A A — A i.— A A — A 


Since k n ^ M n ^ M„_ p , via similar arguments to those developed for the proof of Lemma [2] (ii) , we get 


min {Mn+rjnikn} 


z 


,(n-p) 


min { M n _ v + r ] n , k n } 


i=M n —S n 


z 


,(n-p) 


i=M n —S n 


We thus have to compare 


and 


m.in{M n _ p -\-r) ri ,k n } 

Sn = S “i”" 1 ” 

i=M n —5 n 
min{ M n _ p + t) n , k n } 

Tn- Y. 

i=M n —p — 8 n 


and to prove that those two sums are equivalent when n tends to infinity. Decompose S n as follows: 

min{M„+T)„,fc„} M n —S 

- z 


S n = T n + 


M n —6 n 

(n—p) V 1 „(n-p) 


D n ~ J -n 1* Z a i 2-1 a 

i=min{M n -p+ri ri ,k ri } i=M n - v —S. 
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Arguments from the proof of Lemma [2] (ii) imply that the second summand is negligible in front of the first. Let 
us assume that the third term is non-zero, i.e. M n — S n > M n _ p — 5 n (note that if this term is zero then S n ~ T n 
is already proved). Via Lemma 1 since A f Irl = 1 + o(^g-), we have 




2 «S" _ ” ' (Mn -Sn- M„ + “ <+) "AfU ~ « (4”A) • 


i=M n _ v — 5. 


in view of Lemma jsj (i). Therefore, since ci^n-l ^ we have Sn ~ T n when n tends to infinity, which implies, 
since r} n = o(M n ) and S n = o(M n ), 


Lab n—p,k n 
Lab n ,k n 


(1 + 0 ( 1 )) 


1 

(2M„)p 


(l + o(l)) 



□ 


Finally, this fundamental technical part allows us to use Kozik’s key ideas in order to describe the probability 
distribution induced on Boolean functions, in our two new models. 


5 Adjustment of Kozik’s pattern language theory 

In 2008, Kozik pH| introduced a quite effective way to study Boolean trees: he defined a notion of pattern that 
permits to easily classify and count large trees according to some constraints on their structures. Kozik applied 
this pattern theory to study the classical Catalan tree distribution. We recall the definitions of patterns, illustrate 
them on examples and then extend Kozik’s paper results in order to use them in our new models. This part will 
extensively use Analytic Combinatorics (generating functions, symbolic methods, singularity analysis): we refer the 
reader to Flajolet & Sedgewick’s book [7] for an introduction to these methods. 

Definition 6. (i) A pattern is a binary tree with internal nodes labelled by a or v and with external nodes 

labelled by • or □. Leaves labelled by • are called pattern leaves and leaves labelled by □ are called place¬ 
holders. A pattern language is a set of patterns. 

(ii) Given a pattern language L and a family of trees A4, we denote by L[Ad] the family of all trees obtained by 
replacing every place-holder in an element from L by a tree from M.. 

(in) We say that L is unambiguous if, and only if for any family A4 of trees, any tree of L[M] can be built from 
a unique pattern from L into which trees from M. have been plugged. 

The generating function of a pattern language L is i(x,y) = p L(d,p)x d y p , where L(d,p) is the number of 

elements of L with d pattern leaves and p place-holders. 

Definition 7. We define the composition of two pattern languages L[P\ to be the pattern language of trees which 
are obtained by replacing every place-holder of a tree from L by a tree from P. 

Given an integer i and a pattern L, the pattern L W is defined by the following recursion: LA) = L and 
L b+1) = £(*)[£]. 

Definition 8. A pattern language L is sub-critical for a family A4 if the generating function m(z) of A4 has a 
square-root singularity t, and if £(x,y) is analytic in some set {( x,y ) : |x| < r + e, \y\ ^ m(r) +e} for some positive 

e. 

Definition 9. Let L be a unambiguous pattern language, A i be a family of trees and T a subset of {xiji^i, which 
cardinality does not depend on n. Given an element of L[A4], 

(i) the number of its L-repetitions is the number of its L-pattern leaves minus the number of different variables 
that appear in the labelling of its L-pattern leaves. 

(ii) the number of its (L, T)-restrictions is the number of its L-pattern leaves that are labelled by variables from 
r, plus the number of its L-repetitions. 

Definition 10. Let X be the family of the trees with internal nodes labelled by a connective and leaves without 
labelling, i.e. the family of tree-structures. 
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A xi x 2 A 



X3 Xi A X3 


/ \ 
x 3 x 3 

Figure 2: The tree computes the function x\ v —■ X 2 - 


V 



A • • A 



V □ • □ 


/ \ 

• • 

Figure 3: The pattern is an element of the pattern language N. 


The generating function of I satisfies I(z) = z + 2/(z) 2 , that implies I(z) 
dominant singularity is l /s. Let I n be the n-tli coefficient of I(z). 


(1 — \/l — 8z)/4 and thus its 


We can, for example, define the unambiguous pattern language N by induction as follows: N = »| N v N\N a □, 
meaning that a pattern from N is either a single pattern leaf, or a tree rooted by v which two sub-trees are patterns 
from N, or a tree rooted by a which left sub-tree is a pattern from N and which right sub-tree is a place-holder. 
An element of N is represente d in Fig [ 3 ] It s generating function verifies n(x,y) = x + n(x,y) 2 + yn(x,y) and is 
equal to n(x, y) = |(1 — y — -\/(l — y) 2 — 4x). It is thus sub-critical for X. 

The tree depicted in Fig. [2] is built from the pattern of Fig. [3] It has 5 iV-pattern leaves, 2 ^repetitions 
and 4 (N, {x±, a; 2 })-restrictions. It is also built from the pattern of Fig. [4] and has 2 iV[./V]-pattern leaves, and 2 
(iV[iV], {xi, a^D-restrictions. 


The following key lemma is a generalization of the corresponding lemma of Kozik ns Lemma 3.8]. 

Lemma 5. Let L be an unambiguous pattern, sub-critical for the tree-structures family I. Let r be a fixed positive 
integer. 

(G) Let An^ (resp. A^ r ^) be the number of labelled (with at most k n variables) trees of L[X] of size n and with r 
L-repetitions (resp. at least r L-repetitions). 

(E) Let A& (resp. A^ r ^) be the number of equivalence classes of labelled (with at most k n variables) trees of 
L[X] of size n and with r L-repetitions (resp. at least r L-repetitions). 



Figure 4: The pattern is an element of the pattern language iV[iV]. 
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Then , asymptotically when n tends to infinity, in both models, 


A r ] A [>r\ 

= ° ( rat n) and = ° ( rat n) • 

Proof. First recall that A n = I n • Labin both models. 

Model (G). The number of labelled trees of L[I\ of size n and with at least r L-repetitions is given by: 

n 

A ^' r] = I n (d) ■ Lab(n, k n ,d,r), 

d=r +1 


where I n (d) is the number of tree-structures with d L-pattern leaves (among the n number of leaves) and 
Lab (n, k n , d,r) corresponds to the number of leaf-labellings of these trees giving at least r L-repetitions. The 
following enumeration contains some multi-counting and we therefore get an upper bound: 


Lab(n, k n ,d, r) < 2” • V 

pi V + J 


r + j 

j 


k n (k n -l)---(k n -j + l)k”~ r - j . 


The factor 2 n corresponds to the polarity of each leaf (whether the literal is positive or negative); the index j stands 
for the number of different variables involved in the r repetitions; the binomial factor corresponds to the choices 
of the pattern leaves that are involved in the r repetitions; the Stirling number corresponds to the partition of the 
r + j leaves into j parts; the factor k„fk n — 1) • • • ( k n — j + 1) stand for the choice of the repeated variables, from 
left to right; finally, the factor A:" - ’’ - - 7 corresponds to the choices of the variables assigned to all remaining leaves. 
We have 

'r + j] 


Lab (n, k n , d, r ) < 2 n k r f r • ^ 


j =i 


d 

r + j 


J 


in other terms, 


Lab (n,k n ,d,r) < 2 r Lab„_ rifcri • 
since Lab„ jm = (2 m) n (in model (G)), and 

A\t r] < 2 r • Lab„_ r , feii ^ ] 

.7 = 1 '■ 


, \r + J 

3 =i v 


\r + j 
3 


E Ud ) 

=r+j 


r + j 
3 


d 

r + j 


(7) 


Let (.(x,y) be the generating function of the pattern L. Note that corresponds to pointing p distinct pattern 

leaves (without order) in the L-patterns (where d\ stands for the derivative according to the first coordinate). Then, 
for all p ^ 0, 


Thus, 


A 


[=*r] 


o co co /j\ 

-d p Az,i(z)) = J] £/„(d) W\ 

n= 1 d=p \P/ 

r + j | [z n ]z r+ i% +i t(z,I(z)) 


< 2 Lab n _ r ^ yi 
Lab nifcn 


[z n ]I(z) 


Since d\ +J l{z, I{z)) and I(z) have the same dominant singularity because of the sub-criticality of the pattern L 
according to X, the previous sum tends to a constant (because r is fixed) when n tends to infinity and so we conclude, 
using Propositions [2] and [3] 


a M A^r] 

< JA n _ 

A n A n 




Model (E). The number of equivalence classes of labelled trees of L\l\ of size n and with at least r L-repetitions 
is given by: 

n 

A^ = ^ I n (d) • Lab(n, k n , d, r), 

d=r +1 
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where I„ (d) is the number of tree-structures with d L-pattern leaves and Lab(n, k n , d , r) corresponds to the number of 
leaf-labellings of these trees giving at least r L-repetitions. The following enumeration contains some multi-counting 
and we therefore get an upper bound: 


Lab (n,k n ,d,r) ^ 2™ • 



{T 


Lab n —r,k n 

Qn—r 


The factor 2” corresponds to the polarity of each leaf (whether the literal is positive or negative); the index j stands 
for the number of different variables involved in the r repetitions; the binomial factor corresponds to the choices of 
the pattern leaves that are involved in the r repetitions; the Stirling number corresponds to the partition of r + j 
leaves into j parts; finally, the factor Lab n _r-,fc„ corresponds to the rest of the partition. Therefore, 


Ai^ < 2 r • Lab r , r 


z 

3 = 1 




Applying the same reasoning as for model (G) starting from Equation 0 permits to conclude the proof. □ 

We have finally adapted Kozik’s theory in order to apply it in the new contexts. Since we have extended the 
pattern theory, we are able to use in the following the same key-ideas to describe the probability distributions we 
are interested in. 


6 Behaviour of the probability distribution 


Once we have adapted the pattern theory to our model and proved the central Lemma [5] we are ready to prove our 
main results, namely Theorems [l] and [ 2 ] A first step consists to understand the asymptotic behaviour of (true) 
and P^(true). 

It is natural to focus on this “simple” function before considering a general class (/); and it happens to be essential 
for the continuation of the study. In addition, the methods used to study tautologies (mainly pattern theory) will 
also be the core of the proof for a general function (model (G)) or a general equivalence class (model (E)). 

First, let us introduce some measure in the context of Boolean expressions. Given a family Q of and/or trees 
(resp. equivalence classes of and/or trees), we define its ratio p n (G) as follows: let G n be the number of elements 
of G of size n, 


Pn{G) 


Gn 

A„ 


6.1 Tautologies 

First note that true is the unique element of its equivalence class (true). 

A tautology is an and/or tree that represents the Boolean function true. By symmetry, the functions true and 
false have the same probability in both models. Let T be the family of tautologies. In this part, we prove that the 
probability of true is asymptotically equal to the ratio of a simple subset of tautologies. 

Definition 11 (cf. Fig. [ 5 ]). A simple tautology is an and/or tree that contains two leaves labelled by a variable x 
and its negation x and such that all internal nodes from the root to both these leaves are labelled by v -connectives. 
We denote by S the family of simple tautologies. 

Proposition 5. The ratio of simple tautologies verifies 

3 

fj, n (S) ~ - • rat„, when n tends to infinity. 

Moreover, asymptotically when n tends to infinity, almost all tautologies are simple tautologies, meaning that 

Pn(T) ~ p n (S), when n tends to infinity. 
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\ 
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Figure 5: A simple tautology. 


Proof. The proof is divided in two steps. The first one is dedicated to the computation of the ratio The 

second part of the proof shows that almost all tautologies are simple tautologies. 

Let us consider the non-ambiguous pattern language M = »|M v M| □ a □. Remark that a tree such that two 
M-pattern leaves are labelled by a variable and its negation, is a simple tautology. The generating function of M 
is m(x,y) = |(1 — y/l — 4(x + y 2 )). It is sub-critical for I. 

The generating function /(z) = \ s2 /dx 2 (mn(xz,I{z))\ x= \ enumerates and/or trees with two marked distinct leaves 
linked to the root by or-nodes. Therefore, DC n = I n • Lab n _! j. n is the number of simple tautologies where simple 
tautologies realized by a unique pair of leaves are counted once, those that are realized by two pairs of leaves are 
counted twice, and so on. We have 

DC n In • Lab n —l,k n 

An I n ' Lab n ,k n 

and using a consequence of O Theorem VII.8] (cf. a detailed proof in [IT]): 


lim 

n^co I n 



n*y 


Note that 

z 2 

J(z) =--- ir , 

(1 — 4(z + I(z) 2 )) 2 

and thus, 

I'(z) _ _2z_ (1 + 2J'(z)I(z)) _6z^_ 

Hz) ~ (1 - 4(z + I(z) 2 )) 3/2 I'(z) (1 — 4(z + J(z) 2 )) 5/2 ' 

Note that, when z —> i/s, I'{z) —* + 00 . Moreover, I[}/&) = i/4. Thus, 


Hz) 3 /s 2 3 , 1 

I(z) (1 - 4(1/8 + 1/16)) /2 2 8 

Thus, we get the upper bound 3 / 2 -rat„ for the ratio of simple tautologies: it remains to deal with the double-counting 
in order to compute a lower bound. 

In DC n , simple tautologies realized by a unique pair of leaves are counted once, those that are realized by two 
pairs of leaves are counted twice, and so on. Let us denote by STf the number of simple tautologies counted at 
least i times in DC n : we have DC n = 2,:>i ST„ ■ 

Our aim is to remove from DC n the tautologies that have been over-counted. Therefore, we count simple 
tautologies realized by three M-pattern leaves labelled by a/a/a where a is a literal, and the tautologies realized 
by four M-pattern leaves labelled by a/a/P/j3 where a and j3 are two different literals. Let us denote by 

1 d 3 

h( z ) = ^-^ m ( xz ^ I ( z ))\x=i 

the generating function of tree-structures in which three M-pattern leaves have been pointed and 

1 d 4 

h{z) = 4J ^I m (xz, /(zj) | x= i 
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the generating function of tree-structures in which four M-pattern leaves have been pointed. Then, let 

DCf = 3 • Lab„_ 2 , fe „[ 2 ”]/ 3 (z) and DC^ = 3 • Lab„_ 2 , fc „ [z n ]h(z). 

The integer DC^ (resp. DCn 1 ) counts (possibly with multiplicity) the trees in which three (resp. four) M-pattern 
leaves have been pointed, one of them labelled by a literal and the two others by its negation (resp. two of them 
labelled by two literals associated to two different variables and the two others by their negations). Remark that a 
tree having six M-pattern leaves labelled by a/a/a/(3//?/ft is counted twice by DCn ^ and four times by DC^. 

For all integer i, a simple tautology counted at least i times by DC n is counted at least (i — 1) times by 
DCL 3) +DC ( n ] . Therefore, 

ST n > DC n ~ (DC® + DCW). 

In view of Lemma [5j 

DCi 3) Lab n _ 2 * , DC& ^ Lab „_ 2 

T n Lab n,fe„ T n Lab n,fe„ 

where c 3 and C 4 are positive constants. Then, asymptotically when n tends to infinity, in view of Propositions [2] 
and [3] n n {F) = p n (DC) + o(rat„) ~ 3 / 2 ■ rat„. 

Let us now turn to the second part of the proof: asymptotically, almost all tautologies are simple tautologies. 
Let us consider the pattern N = • |AT v N\N a □. This pattern is unambiguous, its generating function satisfies 
n(x, y) = x + n(x, y) 2 + y ■ n(x, y) and is thus equal to ^(1 — y — — y) 2 — Ax). Consequently, N is sub-critical 

for the family X of tree-structures. 

A tautology has at least one A r [A 7 ]-repetition. Otherwise, we can assign all its A 7 -pattern leaves to false and, 
the whole tree computes false: impossible for a tautology. 

Consider a tautology t with exactly one iV[Ar]-repetition. this repetition must be a x\x repetition and must 
occur among the iV-pattern leaves, using the same kind of argument than above. 

Then, let us assume that there is an A-node denoted by v between the AT-pattern leaf x and the root of the 
tree. This node v has a left sub-tree t\ and a right sub-tree t 2 . Necessarily the leaf x appears in t-\. Then, one can 
assign all the A r -pattern leaves of f 2 (which are A r [A r J-pattern leaves of t) to false, since there is no more repetition 
among the JV[JV]-pattern leaves of t. Also assign all the A^fA^j-pattern leaves of t minus the sub-tree rooted at v to 
false. Then, we can see that t computes false: impossible. We have thus shown that t is a simple tautology. 

In a nutshell, tautologies with exactly one Af[W]-repetition are simple tautologies, a tautology must have at 
least one Af[dV]-repetition and, thanks to Lemma [ 5 ] tautologies with more than one Af[W]-repetitions have a ratio 
of order o(rat ra ), which is negligible in front of the ratio of simple tautologies. □ 

The latter proposition gives us for free the proof for the satisfiability problem. In fact, both dualities between the 
two connectives and positive and negative literals transform expressions computing true to expressions computing 
false, which implies (false) = 2/2 • rat„. Moreover, the only expressions that are not satisfiable compute the 
function false and P-!^ (false) = 2/2 • rat n tends to 0 as n tends to infinity, which proves Corollary |T} 

6.2 Proofs of Theorems [lj and [2] 

This last section is devoted to the general result, i.e. to the study of the behaviour of P n\f) and P n\f} for all 
non constant Boolean function /. The main idea of this part is that, roughly speaking, a typical tree computing a 
Boolean function f is a minimal tree of f into which a single large tree has been plugged. 

In the following, / (resp. (/)) is fixed,we denote by r = L(f) its complexity, and by T/ the set of the essential 
variables of /. We also fix t to be an and/or tree computing /. 

Moreover, we will need the folowing patterns: 

N = • |AT v N\N a □, 

p = »|p v n |p a p, 

and (see Definition [7] where the composition of patterns is defined) 

R = AT (r+1 )[AT®P] and R = N {r+1) [(N ® P) 2 ]. 
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where the language N © P is defined such that the N © P-pattern leaves of a tree are its iV-pattern leaves plus its 
P pattern leaves. It is proved in [14| that this pattern language is indeed non-ambiguous and sub-critical for I if 
N and P are non-ambiguous and sub-critical for I. 

We have already noticed that assigning all 7V-pattern leaves of a Boolean tree to false make the whole tree 
calculate false. The pattern P has the dual property that: assigning all the P-patterns leaves of a tree to true make 
the whole tree calculate true. This is why these two patterns are so useful in the proof of our main result. 

Proposition 6. A tree t computing f (define r := L(f)) with at least one leaf on the (r + 2) th level of the R-pattern 
must have at least r + 1 (R,Tf)-restrictions. 

Proof. Let us assume that t computes /, and has at least one leaf on the (r + 2) th level of the R pattern but has less 
than r P-repetitions. Let i be the smallest integer (smaller than r + 2) such that the number of (iV'+. F^-restrictions 
is equal to the number of (W* -1 ' , T^-restrictions. 

There must be either a repetition or an essential variable in the first level: if there is none, then we can assign 
all the N pattern leaves to false and this operation does not changes the represented function. This function is then 
the constant function false, which is impossible; so i < r + 1 . 

First case: Let us assume that there are strictly less than r (N^ l \ T ^-restrictions. There is no repetition and 
no essential variable in the pattern leaves at level i. Therefore, we can assign them all to false and make the place¬ 
holders of the level i — 1 compute false. Let us replace those place-holders by false in the tree. Furthermore, replace 
by false all the non-essential remaining variables. And simplify the obtained tree to simplify all the constant leaves 
false and true. We obtain a tree t*, which still computes /, and whose leaves are all former N^ 1 ' 1 pattern leaves of 
t labelled by essential variables. The tree t* therefore contains strictly less than r leaves, which is impossible since 
the complexity of / is r. 

Second Case: Let us assume that t has exactly r (N ^, Tf )-restrictions. Since i ^ r + 1, there is no restriction 
in the place-holders of the level r + 2. Therefore, we can replace the place-holders by wild-cards *, which means that 
those wild-cards can be evaluated to true or false independently from each other and without changing the function 
computed by t. We can also replace the remaining leaves labelled by non-essential and non-repeated variables by 
such wild-cards. 

We simplify those wild-cards. Such a simplification has to delete at least one non-wild-card leaf. If we deleted 
a non-repeated essential variable, then the tree t* does not depend on this essential variable and computes /: this 
is impossible. Thus, we deleted a repetition: t* has strictly less than R(f) repetitions and computes /. It is 
impossible. □ □ 

Remark that in Lemma [5] we only count repetitions and not restrictions as it was done in the original lemma 
by Kozik. Though, we will need to consider essential variables and the following lemma permits to handle them. 
An expansion of a tree t is a tree obtained by replacing a sub-tree s of t by s o t e (or t e o s ) where o e {a, v}. 

Lemma 6. Let L be an unambiguous pattern, sub-critical for I. Let f be a fixed Boolean function, Tf the set 
of its essential variables, and A if the set of minimal trees computing f. Let £ be the family of trees obtained by 
expanding once a tree of Aif by trees having exactly p (L,T ^-restrictions. Then, there exists a constant o4 G ) > 0 
(resp. cd E ) > 0 ) such that 

p n {£) ~ ■ rat^^ +p in model (G), 

resp. 

p, n (£) ~ ■ rat(5^ +p in model (E). 

Proof. Let E n be the number of (resp. equivalence classes of) trees of size n in £. We will denote by i the number 
of leaves that are involved in the p ( L , T^-restrictions of the expansion tree: p + 1 ^ * < 2 p. Let 7 / be the cardinal 

ofiy 

In the model (G), for all large enough n, 


E n 


2 p 

Vn(£) = < CSt/ 2 i Z 

n i=p +1 


4 


i(/)] z!d?( £ ( X2,/ ( 0 ))) '*= 


\x=l 


(2 7 / )p(2(fc ra - 7/ ))' 
4(2 KY 


1-L(f)-P 


where cst f = 2 L(f) ■ \Mf\ is an upper bound for the different places in a minimal tree of / where an expansion 
can be plugged in. Since L is sub-critical for I, there exists a positive constant a such that 


z 


i=p +1 


{z n-Hf)y/ adx <(l(x Z J(z))\ x=l 

In 


a 


4-l(/) 

4 



> 0 
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Figure 6: An expansion at node v. Note that the expansion tree t e could have been on the right size of the 
o-connective instead of its left side. 


asymptotically when n tends to infinity. Therefore, in view of Section |4j we have 

Hn{£) ~ a ■ rat^ (/)+p . 

In the model (E), we have, with the same reasoning: 


zp 

Hn(£) = < CSt f Y, U 

n i=p +1 


d l 




\x=l 


2P+- R</> . Lab n _ p _ R(f y, kri 
In. ' Lab,, i. 


from which we state the same conclusion as for the model (G). □ 

Consider the family £ of trees obtained by replacing a sub-tree s by s a t e where t e is a simple tautology into 
a minimal tree of /. Since a simple tautology has at least one ^-repetition, thanks to Lemma [6] there exists two 
positive constants cd G) and a^ E) such that 

^\£) ~ a (G) ■ rat£W> +1 in model (G), 


and 

t l n\£') ~ a< ^ ' rat n ^ +1 in model (E). 

Thanks to Lemma [5j we know that terms computing / with more than R(f) + 2 repetitions are negligible in 
front of the above family. Therefore, since trees with no leaf on the (r + 2) th level are negligible, we have proved 
weaker versions of Theorems [l] an where the equivalent for the probabilities is replaced by an upper and a lower 
bounds of the same order. The rest of the proofs consists in sharpening both bounds. 

The key point of the proof of Theorems [l] and [2] is that a typical tree computing a function / is a minimal tree 
of this function which has been expanded once. In the following, we will only consider two different expansions: 

Definition 12 (cf. Figure [6]). Recall that an expansion of a tree t is a tree obtained by replacing a sub-tree s of t 
by so t e (or t e o s) where o e { a , v }. 

An expansion is a T-expansion if the expansion tree t e is a simple tautology and the connective o is a (or a 
simple contradiction and the connective o is v ). 

An expansion is a X-expansion if the expansion tree t e has a leaf linked to the root by a A-path (resp. a v-path) 
and the o connective is a v (resp. a ). 

Corollary 2. The ratio of the (resp. equivalence class of) minimal trees of f expanded once satisfies that there 
exists two positive constant A f and X^fy such that asymptotically when n tends to infinity: 

p£\E[M f ]) = X f ■ rat^ +1 + o (rat^ +1 ) , 


= X {f} • rat ^> +1 + o (rat«</> +1 ) . 
This corollary is a direct consequence of Lemma [6] 


20 






Lemma 7. Let f be a fixed Boolean function and M ./ the set of minimal trees of f. 

p n\f) ~ ^ { n( E [M.f]) when n — +oo, 

and 

P n\f) ~ /]) when n —» +oo. 

Proof. Let t be a tree computing /. Such a tree must have at least R(f) + 1 ^-repetitions. Moreover, thanks to 
Lemma [ 5 ] trees with at least R(f) + 2 P-repetitions are negligible. We will show that a tree with exactly R(f) + 1 
P-repetitions is in fact a minimal tree expanded once. 

The term t must also have P(/) + 1 P-repetitions and therefore, there is no additional repetition when we 
consider the (r + 3) th level of the P-pattern. 

Let i be the first level such that the number of (N^ , T^-restrictions is equal to the number of Ad z_1 )-restrictions. 
Since there must be a restriction on the first level, i < r + 1. 

First Case: Assume that an essential variable a appears on the pattern leaves of the (r + 3) th level. Therefore, 
t has at most L(f) (N^pT ^-restrictions. Let us replace the place-holders of the (i — l) th level by false and assign 
all the remaining non-essential variables to false. Simplify the tree to obtain a new and/or tree denoted by t*. The 
leaves of this tree are former 7V^ _1 )-pattern leaves of t, labelled by essential variables and t* still computes /. But 
the variable a is essential for /: thus it must still appear in the leaves of t*, and by deleting its occurrence in the 
leaves of the (r + 3) th level, we deleted one repetition. Therefore, t* has at most L(f) — 1 leaves which is impossible! 

Second Case: There is no essential variable among the the pattern leaves of the (r + 3) th level. Since there is 
also no repetition at this level, we can replace the place-holders of the level (r + 3) to wild-cards. We also replace the 
remaining non essential and non-repeated variables by wild-cards. We then simplify the wild-cards and obtained 
a simplified tree t*, computing /, with no wild-cards and which leaves are former leaves of the trees t, essential or 
repeated. During the simplification process, we have deleted at least one of these leaves and therefore t* has at 
most L(f) leaves: it is a minimal tree of /. 

Let us consider the following fact: The lowest common ancestor of all the wild-cards in t, has been suppressed 
during the simplification process. Assume that this fact is false: then two wild-cards have been simplified indepen¬ 
dently during the simplification process, and thus, at least two essential or repeated variables have been deleted. 
The tree t* has thus at most L(f) — 1 leaves and computes /, which is impossible since L(f) is the complexity of /. 
Let us denote by t e the sub-tree rooted at v the lowest common ancestor of the wild-cards. Thus a typical tree 
computing / is a minimal tree of / in which we have plugged a specific expansion tree t e . □ 

Lemma 8. Let t be a typical tree computing f. The expansion tree t e is either a simple tautology (or simple 
contradiction), or an x-expansion - i.e. a tree with one /\-leaf (resp. w-leaf) labelled by an essential variable of f. 

Proof. As shown in the former lemma, a typical tree computing / is a minimal tree of / on which has been plugged 
an expansion tree t e . 

First Case: Let us assume that t e has no (A r ©P)-repetition and no essential variable among its (XffiP)-pattern 
leaves. Then, we can replace t e by a wild-card and simplify this wild-card. This simplification suppresses at least 
one other leaf of the tree: the obtained tree is then smaller than the original minimal tree, and still computes /. It 
is impossible. 

Second Case: Let us assume that t e has at least two ((TV © P) 2 , T/)-restrictions. Thanks to Lemma [b] this 
family of expanded trees is negligible. 

Third Case: Let us assume that t e has exactly one (( N@P) 2 ,V ^-restriction. Then it must be a (iVffiP, T f)- 
restriction (see First Case). 

• if it is a repetition, than one can show that it must be a simple tautology or a simple contradiction. 

• if it is an essential variable, one can show that it must be an X-expansion. 

□ id 
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7 Conclusion 

In this paper, we have generalised the Catalan tree distribution on Boolean functions following two directions: 

• letting the number of variables and the size of the Boolean trees tend to infinity together. It has allowed us 
to answer a fundamental satisfiability problem; 

• the natural equivalence relation on Boolean trees and functions that we have introduced exhibits a very 
interesting threshold/saturation phenomenon for which we have no intuitive explanation up to now. 

It is interesting to see that these two models can be analysed with very similar methods, namely, the ones used in 
the literature to study the classical Catalan tree model: Analytic Combinatorics and Kozik’s pattern theory. The 
key idea that permitted to generalise those methods to our two new models was to dissociate the shapes of the trees 
and their leaf-labelling. 

We strongly believe that our methods could be generalised further, for example to other logical systems (as the 
implication model, see e.g. IS EH). or to non-binary or non-planar uniform trees (see IIP] ). Our confidence rely on 
the fact that those models, in the (k n ) n ^i constant case, can be analysed with analytic combinatorics and pattern 
theory (or tools based on the same key ideas) as well, and we have shown here how to generalise those methods to 
a more general sequence (fc n )n>i- 

A more challenging generalisation would be to consider different probability distributions on binary plane trees. 
For example, in view of mm we conjecture that the random binary search tree of size n, labelled with (k n ) n ^i 
variables defines a very interesting satisfiability problem, with a phase transition a la if— SAT. It would be very 
interesting (but, we expect, non trivial) to prove such a conjecture. Even more challenging would be to ask what 
effect the introduction of the equivalence relation has on this phase transition? 
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